Search CORE

22 research outputs found

Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server

Author: Awan Ahsan Javed
Ayguade Eduard
Brorsson Mats
Vlassov Vladimir
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

In last decade, data analytics have rapidly progressed from traditional disk-based processing to modern in-memory processing. However, little effort has been devoted at enhancing performance at micro-architecture level. This paper characterizes the performance of in-memory data analytics using Apache Spark framework. We use a single node NUMA machine and identify the bottlenecks hampering the scalability of workloads. We also quantify the inefficiencies at micro-architecture level for various data analysis workloads. Through empirical evaluation, we show that spark workloads do not scale linearly beyond twelve threads, due to work time inflation and thread level load imbalance. Further, at the micro-architecture level, we observe memory bound latency to be the major cause of work time inflation.Comment: Accepted to The 5th IEEE International Conference on Big Data and Cloud Computing (BDCloud 2015

arXiv.org e-Print Archive

Crossref

UPCommons. Portal del coneixement obert de la UPC

Memory and Parallelism Analysis Using a Platform-Independent Approach

Author: Awan Ahsan Javed
Corda Stefano
Corporaal Henk
Jordans Roel
Singh Gagandeep
Stuijk Sander
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Emerging computing architectures such as near-memory computing (NMC) promise improved performance for applications by reducing the data movement between CPU and memory. However, detecting such applications is not a trivial task. In this ongoing work, we extend the state-of-the-art platform-independent software analysis tool with NMC related metrics such as memory entropy, spatial locality, data-level, and basic-block-level parallelism. These metrics help to identify the applications more suitable for NMC architectures.Comment: 22nd ACM International Workshop on Software and Compilers for Embedded Systems (SCOPES '19), May 201

arXiv.org e-Print Archive

Crossref

Repository TU/e

Pure OAI Repository

NMPO:Near-Memory Computing Profiling and Offloading

Author: Awan Ahsan Javed
Corda Stefano
Corporaal Henk
Jordans Roel
Kumar Akash
Kumaraswamy Madhurya
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 11/10/2021
Field of study

Real-world applications are now processing big-data sets, often bottlenecked by the data movement between the compute units and the main memory. Near-memory computing (NMC), a modern data-centric computational paradigm, can alleviate these bottlenecks, thereby improving the performance of applications. The lack of NMC system availability makes simulators the primary evaluation tool for performance estimation. However, simulators are usually time-consuming, and methods that can reduce this overhead would accelerate the early-stage design process of NMC systems. This work proposes Near-Memory computing Profiling and Offloading (NMPO), a high-level framework capable of predicting NMC offloading suitability employing an ensemble machine learning model. NMPO predicts NMC suitability with an accuracy of 85.6% and, compared to prior works, can reduce the prediction time by using hardware-dependent applications features by up to 3 order of magnitude

Pure OAI Repository

Near Memory Acceleration on High Resolution Radio Astronomy Imaging

Author: Awan Ahsan Javed
Corda Stefano
Corporaal Henk
Jordans Roel
Kumar Akash
Veenboer Bram
Publication venue
Publication date: 04/05/2020
Field of study

Modern radio telescopes like the Square Kilometer Array (SKA) will need to process in real-time exabytes of radio-astronomical signals to construct a high-resolution map of the sky. Near-Memory Computing (NMC) could alleviate the performance bottlenecks due to frequent memory accesses in a state-of-the-art radio-astronomy imaging algorithm. In this paper, we show that a sub-module performing a two-dimensional fast Fourier transform (2D FFT) is memory bound using CPI breakdown analysis on IBM Power9. Then, we present an NMC approach on FPGA for 2D FFT that outperforms a CPU by up to a factor of 120x and performs comparably to a high-end GPU, while using less bandwidth and memory

arXiv.org e-Print Archive

Pure OAI Repository

Hauptsätze der Differential- und Integral-Rechnung : als Leitfaden zum Gebrauch bei Vorlesungen / zusammengestellt von Robert Fricke ; 1. Theil

Author: Awan Ahsan Javed
Boonstra Albert Jan
Chelini L
Corda S
Corporaal H Henk
Jordans R Roel
Singh G
Stuijk Sander Sander
Publication venue: Vieweg
Publication date: 01/01/1897
Field of study

\u3cp\u3eThe conventional approach of moving data to the CPU for computation has become a significant performance bottleneck for emerging scale-out data-intensive applications due to their limited data reuse. At the same time, the advancement in 3D integration technologies has made the decade-old concept of coupling compute units close to the memory — called near-memory computing (NMC) — more viable. Processing right at the “home” of data can significantly diminish the data movement problem of data-intensive applications. In this paper, we survey the prior art on NMC across various dimensions (architecture, applications, tools, etc.) and identify the key challenges and open issues with future research directions. We also provide a glimpse of our approach to near-memory computing that includes i) NMC specific microarchitecture independent application characterization ii) a compiler framework to offload the NMC kernels on our target NMC platform and iii) an analytical model to evaluate the potential of NMC.\u3c/p\u3

Digitale Bibliothek Braunschweig

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

Project Night-King: Improving the performance of big data analytics using Near Data Computing Architectures

Author: Awan Ahsan Javed
Publication venue: KTH, Programvaruteknik och Datorsystem, SCS
Publication date: 01/01/2017
Field of study

The goal of Project Night-King is to improve the single-node performance of scale-out big data processing frameworks like Apache Spark using programmable accelerators near DRAM and NVRAM. Using modeling techniques, we estimate the lower bound of 5x performance improvement for Spark MLlib workloads.QC 20171031</p

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line